Multimodal Gesture Recognition via Multiple Hypotheses Rescoring

نویسندگان

  • Vassilis Pitsikalis
  • Athanasios Katsamanis
  • Stavros Theodorakis
  • Petros Maragos
چکیده

We present a new framework for multimodal gesture recognition that is based on a multiple hypotheses rescoring fusion scheme. We specifically deal with a demanding Kinect-based multimodal data set, introduced in a recent gesture recognition challenge (ChaLearn 2013), where multiple subjects freely perform multimodal gestures. We employ multiple modalities, that is, visual cues, such as skeleton data, color and depth images, as well as audio, and we extract feature descriptors of the hands’ movement, handshape, and audio spectral properties. Using a common hidden Markov model framework we build single-stream gesture models based on which we can generate multiple single stream-based hypotheses for an unknown gesture sequence. By multimodally rescoring these hypotheses via constrained decoding and a weighted combination scheme, we end up with a multimodally-selected best hypothesis. This is further refined by means of parallel fusion of the monomodal gesture models applied at a segmental level. In this setup, accurate gesture modeling is proven to be critical and is facilitated by an activity detection system that is also presented. The overall approach achieves 93.3% gesture recognition accuracy in the ChaLearn Kinect-based multimodal data set, significantly outperforming all recently published approaches on the same challenging multimodal gesture recognition task, providing a relative error rate reduction of at least 47.6%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Rescoring-Aware Beam Search for Reduced Search Errors in Contextual Automatic Speech Recognition

Using context in automatic speech recognition allows the recognition system to dynamically task-adapt and bring gains to a broad variety of use-cases. An important mechanism of contextinclusion is on-the-fly rescoring of hypotheses with contextual language model content available only in real-time. In systems where rescoring occurs on the lattice during its construction as part of beam search d...

متن کامل

Combined Hand Gesture — Speech Model for Human Action Recognition

This study proposes a dynamic hand gesture detection technology to effectively detect dynamic hand gesture areas, and a hand gesture recognition technology to improve the dynamic hand gesture recognition rate. Meanwhile, the corresponding relationship between state sequences in hand gesture and speech models is considered by integrating speech recognition technology with a multimodal model, thu...

متن کامل

Markerbased Gesture Recognition System in Dancing

In this paper, we report a real-time gesture driven interactive system with multimodal feedback for performing arts, especially dance. The system consists of two major parts: a gesture recognition engine and a multimodal feedback engine. The gesture recognition engine provides real-time recognition of the performer’s gesture based on the 3D marker coordinates from a marker-based motion capture ...

متن کامل

A Toolkit for Creating and Testing Multimodal Interface Designs

Designing and implementing applications that can handle multiple recognition-based interaction technologies such as speech and gesture inputs is a difficult task. IMBuilder and MEngine are the two components of a new toolkit for rapidly creating and testing multimodal interface designs. First, an interaction model is specified in the form of a collection of finite state machines, using a simple...

متن کامل

Bayesian Co-Boosting for Multi-modal Gesture Recognition

With the development of data acquisition equipment, more and more modalities become available for gesture recognition. However, there still exist two critical issues for multimodal gesture recognition: how to select discriminative features for recognition and how to fuse features from different modalities. In this paper, we propose a novel Bayesian Co-Boosting framework for multi-modal gesture ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 16  شماره 

صفحات  -

تاریخ انتشار 2015